Decision Tree Pruning: Biased or
نویسندگان
چکیده
We evaluate the performance of weakestlink pruning of decision trees using crossvalidation. This technique maps tree pruning into a problem of tree selection: Find the best (i.e. the right-sized) tree, from a set of trees ranging in size from the unpruned tree to a null tree. For samples with at least 200 cases, extensive empirical evidence supports the following conclusions relative to tree selection: a fl lo-fold cross-validation is nearly unbiased; b not pruning a covering tree is highly biased; (c) lo-fold cross-validation is consistent with optimal tree selection for large sample sizes and (d) the accuracy of tree selection by lo-fold cross-validation is largely dependent on sample size, irrespective of the population distribution.
منابع مشابه
Decision Tree Pruning as a Search in the State Space
This paper presents a study of one particular problem of decision tree induction, namely (post-)pnming, with the aim of finding acommon framework for the plethora of pruning methods appeared in literature. Given a tree Tm~ to prune, a state space is defined as the set of all subtrees o f T to which only one operator, called any-depth branch pruning operator, can be applied in several ways in or...
متن کاملOn the Complexity of Learning Decision Trees
Various factors a ecting decision tree learning time are explored. The factors which consistently a ect accuracy are those which directly or indirectly (as in the handling of continuous attributes) allow a greater variety of potential trees to be explored. Other factors, e.g., pruning and choice of heuristics, generally have little e ect on accuracy, but signi cantly a ect learning time. We pro...
متن کاملEvaluation of liquefaction potential based on CPT results using C4.5 decision tree
The prediction of liquefaction potential of soil due to an earthquake is an essential task in Civil Engineering. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the...
متن کاملAn Exact Probability Metric for Decision Tree Splitting
ID3's information gain heuristic is well-known to be biased towards multi-valued attributes. This bias is only partially compensated by the gain ratio used in C4.5. Several alternatives have been proposed, notably orthogonality and Beta. Gain ratio and orthogonality are strongly correlated, and all of the metrics share a common bias towards splits with one or more small expected values, under c...
متن کاملExperiments with an innovative tree pruning algorithm
The pruning phase is one of the necessary steps in decision tree induction. Existing pruning algorithms tend to have some or all of the following difficulties: 1) lack of theoretical support; 2) high computational complexity; 3) dependence on validation; 4) complicated implementation. The 2-norm pruning algorithm proposed here addresses all of the above difficulties. This paper demonstrates the...
متن کامل